data = read_excel("Chapter3_exercises_data.xlsx", sheet = "Exercise 1")
## New names:
## • `` -> `...4`
log_RC <- log(data$rpce)
log_I <- log(data$rdpi)
Growth_RC <- diff(log_RC)
Growth_I <- diff(log_I)
plot(data$date[-1],Growth_RC, type = "l", col = "blue", xlab = "Year", ylab = "Growth Rate (Real Consumption)", main = "Growth Rate of Real Consumption")
plot(data$date[-1],Growth_I, type = "l", col = "red", xlab = "Year", ylab = "Growth Rate (Income)", main = "Growth Rate of Disposable Income")
Growth
rate of real consumption is less volatile than growth rate of disposable
income, probably because the permanent income model relates to this
phenomenon by postulating that current and expected future income levels
(together lifetime income) drives consumption (expenditure) patterns,
but is smoothed over time. So if someone has an increase in income, they
will smooth that gain over their lifetime and not spend it
proportionally immediately. Thus in this example, one would change their
consumption in magnitude less in response to an the income change. This
data is evidence of the permanent income hypothesis.
model <- lm(Growth_RC ~ Growth_I)
# Print the summary of the regression model
summary(model)
##
## Call:
## lm(formula = Growth_RC ~ Growth_I)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0304050 -0.0029792 0.0001606 0.0030383 0.0244504
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0022543 0.0002242 10.056 < 2e-16 ***
## Growth_I 0.1745175 0.0292014 5.976 3.8e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005317 on 637 degrees of freedom
## Multiple R-squared: 0.05309, Adjusted R-squared: 0.05161
## F-statistic: 35.72 on 1 and 637 DF, p-value: 3.799e-09
The estimated linear equation is that Growth of consumption = 0.0022543 + 0.1745175 Growth of Income. This suggests that a disposable income growth expects a positive change in consumption. The t values and p values suggest statistical significance, so at the 95% level income growth appears to positively drive expenditure. This R2 score is also very low, meaning that our independent variable of income growth only accounts for about 5% of total variation. Our coefficient of rdpi_growth means that a 1% growth in income is expected to give a 0.17% growth in consumption. Because 0.17% < 1%, this aligns with the permanent income hypothesis.
Growth_I_lag <- lag(Growth_I)
lr_growth_lag <- lm(Growth_RC~Growth_I+Growth_I_lag)
summary(lr_growth_lag)
##
## Call:
## lm(formula = Growth_RC ~ Growth_I + Growth_I_lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0300818 -0.0028874 -0.0000051 0.0029768 0.0255088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0019889 0.0002405 8.269 7.90e-16 ***
## Growth_I 0.1872175 0.0293870 6.371 3.61e-10 ***
## Growth_I_lag 0.0828418 0.0293865 2.819 0.00497 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005284 on 635 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.06476, Adjusted R-squared: 0.06182
## F-statistic: 21.99 on 2 and 635 DF, p-value: 5.855e-10
With the lagged consumption growth we do see a small increase in the Growth_I coefficient, as well as a positive coefficient for the lag parameter. The actual coefficient of Growth_I_lag means that a 1% increase in income in the previous period is expected to give a 0.08% increase in consumption in the current period. The t values and p values of the intercept and the Growth_I variable remain to suggest significance, but the lagged parameter is just passing by at the 95% significance level. This finding does not present strong evidence that last periods growth in income has a significant effect on consumption pattern, which coincides with the permanent income hypothesis. Also, the Adjusted R2 rose to 0.062, but this increase is not notably large.
real_gdp_data <- read_excel("Chapter3_exercises_data.xlsx", sheet = "Exercise 3a")
## New names:
## • `` -> `...3`
read_gdp_mean <- mean(real_gdp_data$rgdp)
ggplot(data=real_gdp_data, mapping=aes(date, rgdp)) +
geom_line(color='blue', lwd=1) +
geom_hline(yintercept=read_gdp_mean, linetype='dashed') +
ggtitle('US Real GDP') +
xlab('Year') +
ylab('RGDP')
Definition: Value of goods and services produced in the
US adjusted for inflation.
Periodicity: Quarters, 1947-2012.
Units: USD billions chain weighted.
Stationary: There is a clear upward trend with some
small local dips and peaks, so this time series is not first (second)
order weakly stationary.
exchange_rate <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3b")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
exchange_rate_mean <- mean(exchange_rate$jpy_usd)
ggplot(data=exchange_rate, mapping=aes(DATE, jpy_usd)) +
geom_line(color='blue', lwd=1) +
geom_hline(yintercept=exchange_rate_mean, linetype='dashed') +
ggtitle('Exchange Rate of Yen vs USD') +
xlab('Year') +
ylab('Rate')
Definition: The value of yen (foreign currency) that is
equal to 1 USD.
Periodicity: Monthly, 1971-01-04 to 2012-06-01.
Units: Rate of Yen to 1 USD.
Stationary: There is a clear downward trend with some
small and moderate local dips and peaks, so this time series is also not
first (second) order weakly stationary.
maturity_yield <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3c")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
maturity_yield_mean <- mean(maturity_yield$CMRate10Yr, na.rm=TRUE)
# Removing zero values under assumption that these should be NA.
maturity_yield[maturity_yield==0] <- NA
ggplot(data=maturity_yield, mapping=aes(DATE, CMRate10Yr)) +
geom_line(color='blue', lwd=1) +
geom_hline(yintercept=maturity_yield_mean, linetype='dashed') +
ggtitle('10-year Treasury Constant Maturity Yield') +
xlab('Rate') +
ylab('Year')
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
Definition: Yields on actively traded
non-inflation-indexed issues adjusted to constant maturities.
Periodicity: Daily, 1962-01-02 to 2012-06-07.
Units: Rate.
Stationary: This plot is less clear in respect to any
trend. Before the mid 1980’s there is an upward trend, but after there
is a downward trend. There is does not appear to be a meaningful mean of
this series nor is there a seemingly constant degree of variance in the
cycles. This series is doubtful to be first (second) order weakly
stationary.
unemployment <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 3d")
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
unemployment_mean <- mean(unemployment$unemrate)
ggplot(data=unemployment, mapping=aes(DATE, unemrate)) +
geom_line(color='blue', lwd=1) +
geom_hline(yintercept=unemployment_mean, linetype='dashed') +
ggtitle('US Unemployment Rate') +
xlab('Year') +
ylab('Rate')
Definition: The percent of unemployed people over the
labor force. The US Labor force includes those 16 years of age and up,
not in institutions, not on active military duty, residing in the United
States.
Periodicity: Monthly, 1948-01-01 to 2012-05-01.
Units: Rate.
Stationary: This plot has an overall upward trend, but
does in fact fluctuate about the mean more than the previous series. It
is unclear if this is first order weakly stationary. Since the variances
are more obviously not constant, I would be confident enough to at least
claim that it is not second order weakly stationary.
data = read_excel("3.5.xlsx")
data
plot(data$Date,data$GDP, type = "l", col = "blue", xlab = "Date", ylab = "GDP", main = "GDP trend")
The
underlying stochastic process is not weakly stationary. The upward trend
indicates that the process must have different means in different
periods of time, so that it is not first order stationary.
# Calculate the growth rate of nominal GDP
data$g1t <- 100 * ((data$GDP-lag(data$GDP)) / lag(data$GDP))
# Print the updated table with the new GDP growth column
print(data)
## # A tibble: 16 × 4
## Date GDP `ln(GDP)` g1t
## <dttm> <dbl> <dbl> <dbl>
## 1 2001-01-01 00:00:00 10022. 9.21 NA
## 2 2001-04-01 00:00:00 10129. 9.22 1.07
## 3 2001-07-01 00:00:00 10135. 9.22 0.0612
## 4 2001-10-01 00:00:00 10226. 9.23 0.900
## 5 2002-01-01 00:00:00 10338. 9.24 1.09
## 6 2002-04-01 00:00:00 10446. 9.25 1.04
## 7 2002-07-01 00:00:00 10546. 9.26 0.965
## 8 2002-10-01 00:00:00 10618. 9.27 0.673
## 9 2003-01-01 00:00:00 10745. 9.28 1.20
## 10 2003-04-01 00:00:00 10884 9.30 1.30
## 11 2003-07-01 00:00:00 11117. 9.32 2.14
## 12 2003-10-01 00:00:00 11271. 9.33 1.39
## 13 2004-01-01 00:00:00 11473. 9.35 1.79
## 14 2004-04-01 00:00:00 11658. 9.36 1.61
## 15 2004-07-01 00:00:00 11815. 9.38 1.35
## 16 2004-10-01 00:00:00 11995. 9.39 1.52
plot(data$Date, data$`ln(GDP)`, type = "l", col = "blue", xlab = "Date", ylab = "ln(GDP)", main = "ln GDP trend")
The
logarithmic transformation helps to stabilize the variance. The figures
above show that the log transformation does not affect the trending
behavior of the GDP series, and therefore, yt is not first order
stationary but it is smoother than the original GDP series.
# Calculate the growth rate of nominal GDP
data$g2t <- 100 * ((data$`ln(GDP)`-lag(data$`ln(GDP)`)))
# Print the updated table with the new GDP growth column
print(data)
## # A tibble: 16 × 5
## Date GDP `ln(GDP)` g1t g2t
## <dttm> <dbl> <dbl> <dbl> <dbl>
## 1 2001-01-01 00:00:00 10022. 9.21 NA NA
## 2 2001-04-01 00:00:00 10129. 9.22 1.07 1.07
## 3 2001-07-01 00:00:00 10135. 9.22 0.0612 0.0612
## 4 2001-10-01 00:00:00 10226. 9.23 0.900 0.896
## 5 2002-01-01 00:00:00 10338. 9.24 1.09 1.09
## 6 2002-04-01 00:00:00 10446. 9.25 1.04 1.03
## 7 2002-07-01 00:00:00 10546. 9.26 0.965 0.960
## 8 2002-10-01 00:00:00 10618. 9.27 0.673 0.671
## 9 2003-01-01 00:00:00 10745. 9.28 1.20 1.19
## 10 2003-04-01 00:00:00 10884 9.30 1.30 1.29
## 11 2003-07-01 00:00:00 11117. 9.32 2.14 2.12
## 12 2003-10-01 00:00:00 11271. 9.33 1.39 1.38
## 13 2004-01-01 00:00:00 11473. 9.35 1.79 1.77
## 14 2004-04-01 00:00:00 11658. 9.36 1.61 1.60
## 15 2004-07-01 00:00:00 11815. 9.38 1.35 1.34
## 16 2004-10-01 00:00:00 11995. 9.39 1.52 1.51
From the third and the fifth columns of the table, we observe that there are not significant differences between g1t and g2t, so that the log-difference used in d. is a good approximation to compute growth rates.
Data <- read_excel("Chapter3_exercises_data.xlsx", sheet="Exercise 7")
Data$pt <- log(Data$`SP500 Index`)
Data$Daily_return <- (Data$pt - lag(Data$pt))
Data #Daily return shown as the fourth column of data
# Compute sample moments
mean_return <- mean(Data$Daily_return, na.rm = TRUE)
variance_return <- var(Data$Daily_return, na.rm = TRUE)
skewness_return <- moments::skewness(Data$Daily_return, na.rm = TRUE)
kurtosis_return <- moments::kurtosis(Data$Daily_return, na.rm = TRUE)
# Print the computed sample moments
cat("Mean of Daily Returns:", mean_return, "\n")
## Mean of Daily Returns: 3.213237e-05
cat("Variance of Daily Returns:", variance_return, "\n")
## Variance of Daily Returns: 0.000206988
cat("Skewness of Daily Returns:", skewness_return, "\n")
## Skewness of Daily Returns: -0.341148
cat("Kurtosis of Daily Returns:", kurtosis_return, "\n")
## Kurtosis of Daily Returns: 11.36724
# Plot histogram of Daily returns
hist(Data$Daily_return, breaks = 30, main = "Histogram of Daily Returns", xlab = "Daily Returns")
# Create lagged versions of Daily Return
Data$Rt_minus_1 <- lag(Data$Daily_return)
Data$Rt_minus_2 <- lag(Data$Rt_minus_1)
Data$Rt_minus_3 <- lag(Data$Rt_minus_2)
Data$Rt_minus_4 <- lag(Data$Rt_minus_3)
# Plot Rt against Rt−1, Rt−2, Rt−3, and Rt−4
plot(Data$Rt_minus_1, Data$Daily_return, main = "Rt vs. Rt−1", xlab = "Rt−1", ylab = "Rt", col = "blue")
plot(Data$Rt_minus_2, Data$Daily_return, main = "Rt vs. Rt−2", xlab = "Rt−2", ylab = "Rt", col = "green")
plot(Data$Rt_minus_3, Data$Daily_return, main = "Rt vs. Rt−3", xlab = "Rt−3", ylab = "Rt", col = "red")
plot(Data$Rt_minus_4, Data$Daily_return, main = "Rt vs. Rt−4", xlab = "Rt−4", ylab = "Rt", col = "purple")
I cannot
discern any pattern in any of the four graphs.
private = us_employment %>%
filter(Title == "Total Private") %>%
ungroup()
autoplot(private, Employed) + labs(title="Total Private Employed")
gg_season(private, Employed) + labs(title="Total Private Employed")
gg_subseries(private, Employed) + labs(title="Total Private Employed")
gg_lag(private, Employed) + labs(title="Total Private Employed")
autoplot(ACF(private, Employed)) + labs(title="Total Private Employed")
By focusing on the first graph (autoplot), we indeed could find out a
strong upward trend and some seasonality, as it fluctuates regularly.
This is also proven by looking at the ACF plot. However, by looking at
the seasonal plot, the curves are quite flat, which means there might
not be strong seasonality appearing. For employment data, this pattern
might be good. We can also see that the fluctuation in autoplot is
small, which corresponds to this. It seems that there is some cyclicity
here, as the line indeed goes up and down for some non-fixed period in
the autograph. We may also see that this time series data is in monthly
frequency. One unusual period could be 2008-2010, when there is a big
decrease in the Employed variable. Something might happen at that
period, such as the famous financial crisis.
autoplot(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(aus_production, Bricks) + labs(title = "Bricks")
## Warning: Removed 20 rows containing missing values (gg_lag).
autoplot(ACF(aus_production, Bricks)) + labs(title = "Bricks")
By looking at the autoplot graph, we indeed can discover seasonality of
the time series within years. We can also find out cyclicity, as the
curve goes up and down in unfixed time periods. Linear trends in this
case might not be appropriate, but quadratic trends might be useful. We
can also find out that this series is in quarterly frequency. By
examining the seasonal plot and the subseries plot, we can see that the
Bricks variable indeed changes in some regular way throughout quarters
in a year. For example, the number of Bricks in the second and third
quarter is higher than the rest. This may imply a higher demand of
bricks during that period. These are all signs of seasonal patterns.
Periods 1973-1975 and 1982-1983 are quire unusual, because the number of
Bricks drop at a very high degree in these periods.
autoplot(pelt, Hare) + labs(title = "Hare")
gg_subseries(pelt, Hare) + labs(title = "Hare")
gg_lag(pelt, Hare) + labs(title = "Hare")
autoplot(ACF(pelt, Hare)) + labs(title = "Hare")
Due
to some reasons, gg_season() doesn’t work here, but we can also make
conclusions on seasonality based on the other four graphs. By checking
the autograph and the lag plot, it’s hard to say that any seasonality
exists. This mighe be because of the time frequency we choose. Yearly
data might not be satisfying. It’s also very hard to determine a trend,
as it might only be a flat line. But it seems like some cyclicity
exists. It’s quite a common pattern that number of Hare increases a lot
in a short period then decreases a lot over time. This might relate to
the habit pattern of Hare. It seems like there are no unusual years, as
a big increase and decrease in number of Hare is a common pattern.
H02 = PBS %>%
filter(ATC2 == "H02") %>%
ungroup()
autoplot(H02, Cost) + labs(title = "H02 Cost")
gg_season(H02, Cost) + labs(title = "H02 Cost")
gg_subseries(H02, Cost) + labs(title = "H02 Cost")
autoplot(ACF(H02, Cost)) + labs(title = "H02 Cost")
Because our dataset has more than 1 index/group, gg_lag() cannot handle
this case. However, we can still use the rest 4 graphs to figure out if
seasonality presents in our time series data. Since the data has been
grouped into 4 groups, we should see the patterns of each of them. We
can see that our data is in monthly frequency. By checking the
autograph, we can see strong seasonality in three of them except the
group General/Co-Payments. Indeed, as we check the seasonal plot, this
group has the most chaotic curves, which doesn’t show evidence of
seasonality. All of them don’t have a clear sign of cyclicity, and it’s
hard to determine the trend for all of them except the group of
Concessional/Co-payments, which may have an upward trend. This series
tells us that costs among different groups are different, which is a
reasonable conclusion. One unusual thing is that the volatility for 2
groups of Concessional are higher than the other two groups. This might
be related to the concession type.
autoplot(us_gasoline, Barrels) + labs(title = "Barrels of Oil")
gg_season(us_gasoline, Barrels) + labs(title = "Barrels of Oil")
gg_subseries(us_gasoline, Barrels) + labs(title = "Barrels of Oil")
gg_lag(us_gasoline, Barrels) + labs(title = "Barrels of Oil")
autoplot(ACF(us_gasoline, Barrels)) + labs(title = "Barrels of Oil")
We can see that the time series has weekly frequency. It’s very hard to
see the seasonality from just looking at the graphs, as the frequency is
high, but we can still recognize some pattern of seasonality. If we look
at the ACF curve, it is the evidence of strong seasonality and strong
trend. The trend is therefore clear, which an upward linear line would
be appropriate. There are some signs of cyclicity as well. The data
might imply that the demand for barrels of oil changes over time in a
pattern, which might be related to the production. It seems like the
volatility of barrels first remains constant and high, but decreases
since 2004, and becomes high and constant again in about 2010. This
pattern is an unusual fact.
usa_data <- subset(global_economy, Country == "United States")
usa_data
autoplot(usa_data)+ labs(title = "USAGDP")
## Plot variable not specified, automatically selected `.vars = GDP`
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP/Population) + labs(title = "United States GDP Per Capita")
Performed a population transformation to obtain per-capita data due to
the potential impact of population changes on GDP. The overall trend is
the same as the two plots are very similar.
Vic <- subset(aus_livestock, State == "Victoria" & Animal == "Bulls, bullocks and steers")
Vic %>%
autoplot(Count) + labs(title = "Slaughter of Vicorian Bulls, Bullocks, and Steers")
No
transformation
vic_elec %>% autoplot(Demand) + labs(title = "Victorian Electricity Demand")
vic_elec %>%
group_by(Date) %>%
index_by(Date = yearweek(Time)) %>%
summarise(Demand = sum(Demand)) %>%
autoplot(Demand) + labs(title= "Weekly Victorian Electricity Demand", y = "$US (in trillions)")
Performed a calendar transformation to reflect weekly demand rather than
half-hourly demand. Plotting a point for every 30 minutes makes the plot
difficult to interpret because it is so cluttered that seasonality in
particular is hard to observe. Plotting weekly electricity demand
results in a much cleaner plot such that it is easier to see the
seasonality and variation in weeks.
aus_production %>% autoplot(Gas)
lambda <- aus_production %>% features(Gas, features = guerrero) %>% pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) + labs(y = "", title = "Transformed Gas Production (lambd = 0.11)")
The
variation increases with the level of the series, so a box-cox
transformation helps to make all the variances similar across the whole
series.
set.seed(123)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title = "Decomposition of Retail Turnover using X-11")
There are a
few outliers that is obvious from looking at the spikes in the irregular
plot, the most significant one is the one in 2001. Another observation
is that the seasonality decreaases over time.
There is an overall increasing trend in the number of persons in the civilian labor force in Australia. However, there were decline in the labor force around 1991 and 1992 by the significant decrease in the remainder plot, which main be due to a recession. There is some level of sesonality, although the scale of seasonality is insignificant compared to trend, so the seasonality does not have much of an influence. There is some cycles present, although insignificant.
The recession of 1991/1992 is visible in the estimated components.